data mining primitive
Data Mining Primitives or Tasks
The first primitive is the specification of the data on which mining is to be performed. Typically, a user is interested in only a subset of the database. It is impractical to mine the entire database, particularly since the number of patterns generated could be exponential w.r.t the database size. Furthermore, many of the patterns found would be irrelevant to the interests of the user. In a relational database, the set of task-relevant data can be collected via a relational query involving operations like selection, projection, join and aggregation. This retrieval of data can be thought of as a "subtask" of the data mining task. The data collection process results in a new data relational called the initial data relation. The initial data relation can be ordered or grouped according to the conditions specified in the query. The data may be cleaned or transformed (e.g.